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ABSTRACT 


The  estimation  of  mixing  proportions  in  the  mixture 

density  ffx)  ~  often  encountered  in  agricultural  remote 

sensing  problems  in  which  case  thej^^^  usually  represent  crop  proportions. 
In  these  remote  sensing  applications,  component  densities  Xj4^have 
typically  been  assumed  to  be  normally  distributed,  and  parameter  estima¬ 
tion  has  been  accomplished  using  maximum  likelihood  (ML)  techniques.  In 
this  paper  we  examine  minimum  distance  (MD)  estimation  as  an  alterna¬ 
tive  to  ML  where,  in  this  investigation,  both  procedures  are  based  upon 
normal  components.  Results  indicate  that  ML  techniques  are  superior 
to  MD  when  component  distributions  actually  are  normal,  while  MD  esti¬ 
mation  provides  better  estimates  chan  ML  under  synanetric  departures  from 
normality.  When  component  distributions  are  not  synanetric,  however,  it 
is  seen  that  neither  of  these  normal  based  techniques  provides  satis¬ 
factory  results. 
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1.  Introduction 

A  conunon  objective  in  remote  sensing  is  the  estimation 

of  the  proportions  p,»Pv*»P  mixture  density 

12  m 

f(x)  »  p^fj^(x)  +  pjfjCx)  +  •••  +  (1*1) 

where  m  is  the  number  of  components (crops)  in  the  mixture 
and  for  component  i,f^ (x)  is  a  (possibly  multivariate) 
density.  In  past  practice  this  density  has  been  assumed  to 
be  (multivariate)  normal  with  X  being  the  reflected  energy 
in  four  bands  of  the  light  spectrumr  certain  linear 
combinations  of  these  readings,  or  other  derived  "feature" 
variables.  Generally  the  parameter  estimation  has  been 
accomplished  using  maximum  likelihood  techniques.  In  this 
paper  we  examine  the  use  of  minimum  distance  estimation  as 
an  alternative  to  maximum  likelihood  and  we  will  compare 
the  performance  of  the  two  estimation  techniques  when 
dealing  with  mixtures  of  normal  and  of  non-normal  densities 
with  varying  amounts  of  separation.  We  will  focus  on  the 
mixture  of  two  univariate  distributions  given  by 

f(x)  «  pfj^(x)  +  (l-p)f2(x)-  (1.2) 
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We  are  also  assuming  that  only  data  from  the  mixture 
distribution  are  available.  Other  sampling  schemes  in  which 
training  samples  from  the  component  distributions  are  also 
available  have  been  discussed  by  Hosmer (1973) , 
Redner (1980) ,  and  Hall(1981)  among  others. 

2.  Estimation  in  the  Mixture  of  Normals  Model 
In  this  section  we  will  assume  that  f^(x)  and  f^  (x)  in 


(1.2) 

are  normal  densities  with 

mean  and  variance  y  . 

and 

U2> 

respectively  where 

it 

is 

assumed  that  all 

five 

2  2 

parameters  Uj^r  ^2.' 

P 

are 

unknown.  Techniques 

for 

estimating  these  parameters  will  be  discussed. 

(a)  Maximum  Likelihood 

Several  recent  articles  have  dealt  with  the  problem  of 

2 

obtaining  the  maximum  likelihood  estimates  of  f  r  ^2  ' 
Oy  and  p  (Hasselblad(1966) r  Day(1969),  Wolfe(1970), 

Hosmer (1975) ,  Fowlkes (1979) r  Lennington  and  Rassbach(1979) , 
and  Redner (1980) . )  Since  the  likelihood  function 

L  =  f(Xj^)f(x2)  ...  f(x^)  (2.1) 

where  n  is  the  sample  sizer  is  not  a  bounded  function  in 
this  case  (see  Day(19€9))r  the  objective  in  the  maximum 
likelihood  approach  is  to  find  a  local  maximum  of  L.  This 
maximum  is  usually  found  by  setting  the  partial  derivatives 
of  log(L)  with  respect  to  each  of  the  5  parameters  equal  to 
zero  and  solving  the  resulting  set  of  equationsr  called  the 
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IxKelihood  equations.  Since  closed  foi;m  solutions  ot  these 
equations  do  not  exists  they  must  be  solved  using  iterative 
techniques.  Hasselblad(1966)  and  Wolfe (1969)  suggested  that 
these  equations  be  solved  by  talcing  advantage  of  their 
fixed  point  form.  Redner(1980)  and  Redner  and  Walker (1982) 
have  pointed  out  that  this  fixed  point  technique  is 
essentially  an  application  of  the  EH  algorithm  (see 

Dempster,  Laird  and  Rubin(1977))  with  the  only  difference 
being  that  using  the  EM  algorithm,  the  estimates  of  and 
0^  at  step  k  involve  the  updated  k^"  step  estimates  of 
and  U2 

Fowlkes (1979) ,  on  the  other  hand,  maximized  the 

likelihood  function  directly  by  utilizing  a  quasi-Newton 
method  for  minimizing  -log(L)  and  found  that  good  starting 
values  were  crucial  for  acceptable  performance. 
Hosmer(1975)  stated  that  using  the  likelihood  equations, 
starting  values  were  not  a  serious  problem  in  his 
experience.  In  order  to  determine  which  of  the  two 

techniques  seemed  preferable  in  our  simulation  studies  we 
replicated  simulations  performed  by  Fowlkes  in  which 
various  sets  of  poor  starting  values  were  used  to  initiate 
the  minimization  procedure.  We  simulated  realizations  from 
the  mixture  utilized  by  Fowlkes  and  estimated  the 

parameters  using  both  direct  maximization  and  the  EH 
algorithm.  The  results  of  our  simulations  indicate  that 
the  EM  algorithm  approach  is  preferable  and  hence  we  have 
used  this  technique  for  obtaining  MLEs  in  our  simulations. 
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(b)Hiniinuin  Distance 

Although  ML  estimation  procedures  are  known  to  have 
certain  optimality  properties,  their  sensitivity  to 
violations  of  the  underlying  assumptions  is  also 
recognized.  The  development  of  estimation  procedures  which 
perform  well  even  under  moderate  deviations  from 
assumptions  has  been  a  topic  of  major  interest  in  recent 
literature.  One  of  these  robust  procedures  which  has 
received  recent  attention  is  that  of  minimum  distance (HD) 
estimation  introduced  by  Wolfowitz (1957) .  Parr  and 
Schucany (1980) ,  for  example,  have  shown  that  MD  techniques 
provide  robust  estimators  of  the  location  parameter  of  a 
symmetric  distribution.  Minimum  distance  estimation  has 
been  used  for  parameter  estimation  in  the  mixture  model  by 
Choi  and  Bulgren(1968)  and  MacDonald (1971)  with  some 
success  although,  to  our  knowledge,  the  question  of 
sensitivity  to  assumptions  in  this  setting  has  not  been 
addressed.  These  previous  authors  assumed  that  the 
parameters  of  the  component  distributions  were  known  and 
that  only  the  mixing  proportion (s)  was  to  be  estimated. 

In  order  to  briefly  describe  minimum  distance 
estimation,  we  let  Xj^,X2, . . .  .X^^denote  a  random  sample  from 
a  population  with  distribution  function  F  and  let 
denote  the  empirical  distribution  function,  i.e.  F^(x)«k/n 
where  k  is  the  number  of  observations  less  than  or  equal 
to  X..  Further,  letV=  denote  a  family  of 

distributions  depending  on  the  possibly  vector  valued 


parameter  e.  The  ND  estimate  of  3  is  tnat  value  of  6  for 
which  the  distance  between  and  Hg  is  minimized.  It  is 

not  necessary  that  Of  courser  when  a  mixture  of  two 

normals  is  used  as  the  projection  family,  H0  becomes 

*  X 

Hg(x)  -  p  /  -i-  e  ^  ®1  dy  +  (1-p)  /  -^  e  ^  °2 

_0D  -»  ^2 


Certain  considerations  become  obvious  at  this  point. 
First,  we  must  define  what  we  mean  by  the  "distance" 
between  two  distributions.  Several  such  distance  measures 
have  appeared  in  the  literature.  The  reader  is  referred  to 
the  article  by  Parr  and  Schucany (1980)  for  a  discussion  of 


these  measures.  For  our  purposes  we  have  chosen  the 


Cramer'von  Mises  distance,  w  ,  between  distribution 


functions  G  ^  and  G  ^  which  is  given  by 


=  /(Gj^(x)-G2(x)  ]^dG2(x)  . 

wQB 

In  our  setting  a  computing  formula  for  the  Cramer-von 


Hises  distance  between  F  and  H.  is  given  by 

n  V 


where  Y  . 

1 


is  the  ith  order  statistic.  The  similarity 


between  and  the  sum  of  squared  differences  between  the 
n 


empirical  distribution  function  F  and  H.  used  by  Choi  and 

n  9 


Bulgren(1968)  should  be  noted. 

Another  consideration  Involves 


the  minimization 


procedure  to  be  employed  in  minimizing  W^.  Parr  and 


Schucany  used  the  IHSL  quasi-Newton  algorithm  ZXMIN.  Our 

comparisons  have  shown  « however /  that  the  IMSL  routine 

ZXSSQ  which  uses  Marquardt* s(1963)  method  for  minimizing  a 

sum  of  squares  was  significantly  faster,  usually  talcing  no 

more  than  half  the  time  required  by  ZXMIN.  In  the 

simulation  studies  reported  in  the  next  section  we  have 

used  the  Marquardt  minimization  procedure  when  calculating 

the  MDE.  It  should  be  noted  that  minimization  is  subject 

2  2 

to  the  constraints  ,  a2^0  ,  and  0<p£l .  Another  finding 
which  deserves  mention  before  proceeding  is  that  similar 
to  the  technique  we  have  chosen  for  calculating  the  MLE, 
the  MDE  has  the  desirable  property  that  it  is  relatively 
insensitive  to  starting  values. 

3.  Starting  Values 

In  order  for  the  estimators  discussed  in  the  previous 
chapter  to  be  used  in  practice,  starting  values  for  the 
iterative  procedures  must  be  provided.  We  have  chosen  to 
obtain  starting  values  in  this  two  component  univariate 
setting  using  a  partitioning  technique  which  is  very  easy 
to  implement.  In  the  discussion  to  follow  we  will  assume, 
without  loss  of  generality,  that  Uj^<V2*  technique 

involves  first  obtaining  the  initial  estimate  of  p, 
denoted  by  p^,  and  then  estimating  the  remaining  four 
parameters  given  p^ .  Under  the  current  implementation, 
only  the  9  values  .1,./  ...,.9  ^ire  allowed  as  possible 


values  for  .  For  each  allowable  value  of  the  sample 
is  divided  into  two  subsamples  : 


Y  ,  Y  , 

Hj+l  nj^+2 


.-.Y, 


where  Y^  is  the  ith  order  statistic  and  n^^  is  npg  rounded 

to  the  nearest  integer.  The  value  for  p^  is  that  value  of 

2 

p  for  which  p  (1-p  )  (m^^-m^)  is  maximized,  where  m^  is 

the  sample  median  of  the  jth  subsample.  The  criterion  used 

here  is  a  robust  counterpart  to  the  classical  cluster 

analysis  procedure  of  selecting  the  clusters  for  which  the 

within  cluster  sum-of-sguares  is  minimized.  It  is  easy  to 

show,  however,  that  the  within  cluster  sum-of-squares  is 

2 

minimized  in  the  two  cluster  case  when  p(l-p)  is 

maximized,  where  x^  is  the  sample  mean  of  cluster  j  and 
and  p»n^/n  with  n^^  the  number  of  sample  values  placed  in 
cluster  1.  Such  a  clustering  is  based  upon  a  cut-point, 
c  ,  for  which  all  sample  values  below  c  are  assigned  to 
the  cluster  associated  with  population  1.  It  must  be 
observed,  however,  that  due  to  the  overlap  between  the  two 


mixture  distributions,  some  sample  points  assigned  to 
cluster  1  may  be  from  population  2  and  some  observations 
from  population  1  may  be  in  cluster  2.  The  effect  of  this 
truncation  of  the  right  tail  in  population  1  is  that  the 
sample  mean  from  cluster  1  is  likely  to  underestimate 
while  ^2  is  likely  to  be  overestimated.  In  addition  and 
02  sre  likely  to  be  underestimated  by  and  s^.  If  we 
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assume  that  the  overlap  between  the  two  populations  is  not 

too  severe,  then  the  sample  values  in  cluster  1  to  the 

left  of  mj^  are  relatively  pure  observations  from 

population  1  in  which  case  m^  is  a  "good"  estimate  of  the 

population  mean  in  the  case  of  symmetric  distributions. 

This  reasoning  also  indicates  that  m  and  m  should 
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provide  better  estimates  of  vl^  and  ^2  than  would  and 

In  order  to  estimate  the  variances  of  the  component 
distributions  we  again  will  depend  upon  the  fact  that  the 
values  to  the  left  of  m^^  and  to  the  right  of  m2  are  "pure" 
samples  from  populations  1  and  2  respectively.  Thus,  we 
will  use  only  this  portion  of  the  data  for  estimation  of 
the  sample  variances.  We  have  used  the  fact  that  the 


semi-interquartile  range  of  a  standard  ’normal 
is  ,6745,  to  estimate  by 


aj(0)  =  ( 


^1  - 
.6745 


distribution 


where  rj'^^  is  the  percentile  from  the  jth  cluster, 
j=l,2.  Similarly,  =  [(t2^''’^^ ■^) . 

In  the  next  section  we  will  discuss  the  results  of  a 
major  simulation  investigation  comparing  ML  and  MD 
estimation.  In  these  simulations  the  iterative  techniques 
were  initiated  by  the  starting  values  as  discussed  in  the 
previous  paragraph.  A  preliminary  simulation  investigated 
the  performance  of  the  starting  values  described  here.  In 
this  preliminary  study  we  compared  the  convergence 


initiated  from  these  starting  values  with  that  when  the 
iterative  procedures  are  started  at  the  true  parameter 
values.  The  convergence  from  these  two  starts  was  almost 
always  to  the  same  parameter  estimates,  a  result  which 
held  for  both  the  HLE  and  MDE.  For  this  reason  and  results 
to  be  shown  in  Section  4,  we  believe  this  starting  value 
procedure  to  be  adequate. 

4.  Simulation  Results 

In  the  previous  two  sections  we  have  discussed  ML  and 
MD  estimators  for  the  parameters  of  the  mixture  of  two 
distributions.  In  this  section  we  report  the  results  of 
simulations  designed  to  compare  these  two  estimators  when 
the  component  distributions  are  normal  and  when  they  are 
non-normal.  In  addition  we  have  made  our  comparisons  under 
varying  degrees  of  separation  between  the  two 
distributions.  All  computations  were  performed  on  the  CDC 
6600  at  Southern  Methodist  University. 

In  our  comparison  of  the  MDE  and  MLE  we  have  begun  by 
comparing  their  performance  when  the  normality  assumption 
is  valid,  i.e.,  when  the  component  distributions  actually 
are  normal.  We  should  mention  that  because  of  the 
optimality  properties  of  the  MLE  we  would  expect  that  the 
MLE  would  be  superior  in  this  situation.  Since  in  practice 
the  validity  of  the  normality  assumption  is  subject  to 
question,  we  are  also  very  interested  in  the  performance 
of  the  MDE  and  MLE  when  the  component  distributions  are 


not  normal.  To  this  end  we  have  simulated  mixtures  in 
which  the  component  distributions  are  distributed  as  a 
Student's  t  with  4  degrees  of  freedom.  We  simulated  500 
samples  of  size  n^lOO  from  mixtures  of  normal  and  of  t(4) 
components  for  each  of  the  following  parameter 
configurations; 

Mixing  proportion 
.25 
.50 
.75 

Variances 


The  nature  of  the  mixture  model  also  depends  on  the 
amount  of  separation  between  the  two  component 
distributions.  While,  for  sufficient  separation,  the 
mixture  model  has  a  characteristic  bimodal  shape, 
Behboodian (1970)  has  shown,  for  example,  that  a  sufficient 
condition  for  the  mixture  density  (of  two  normal 
components)  to  be  unimodal  is  that 

course,  in  this  situation,  parameter  estimation  is 
difficult. 

For  purposes  of  quantifying  this  separation  between 
the  components,  we  will  define  a  measure  of  "overlap" 
between  two  distributions.  Without  loss  of  generality  we 


assume  that  population  1  is  centered  to  the  left  of 
population  2.  He  define  "overlap”  to  be  the  probability  of 
misclassification  using  the  rule: 

Classify  an  observation  x  as: 
population  1  if  x  <  x  ^ 
population  2  if  x  >  x^  , 

where  x^  is  the  unique  point  between  and  ^2  such  that 

We  have  based  our  current  study  on  "overlaps"  of  .03  and 

.10.  In  Figure  1  we  display  the  mixture  densities  associated 

with  normal  components  and  cr^ » For  each  mixture,  the 

scaled  components  pf^(x)  and  (l-p)f2(x)  are  also  shown.  Note 

that  the  densities  for  pB.75  are  not  displayed  here  since 

when  it  follows  that  fP(x)»f ^”P(Uj^+U2~x)where  f^(x) 

denotes  the  mixture  density  associated  with  a  mixing 

proportion  of  h.  Thus  the  shapes  of  the  densities  at  ps.75 

can  be  inferred  from  those  at  p».25.  Likewise,  parameter 

estimation  for  p«.75  is  not  included  in  the  results  of  the 

2  2 

simulations  when 

Although  both  estmation  procedures  provide  estimates  of 
all  5  of  the  parameters,  only  the  results  for  the  estimation 
of  p  will  be  shown  since  the  mixing  proportion  is  the 
parameter  of  primary  interest.  In  addition,  when  dealing 
with  the  non-normal  mixtures,  the  remaining  parameter 
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estimates  often  do  not  have  a  meaningful  interpretation,  in 
these  simulations  we  have  used  the  procedure  discussed  in 
the  previous  section  to  obtain  starting  values.  It  should  be 
noted  that  although  we  refer  to  mixtures  of  t(4) 
distributions  here,  they  are  actually  mixtures  of 
distributions  associated  with  the  random  variable  T'-aT-t-b, 
where  T  has  a  t(4)  distribution.  These  modifications  are 
made  in  order  to  obtain  the  desired  separation  and  variance 
ratios. 

In  Table  1  we  show  the  results  of  the  simulation 
comparing  the  performance  of  the  MLE  and  MDE.  In  particular, 

A 

let  p^  denote  the  estimate  of  p  for  the  ith  sample.  Then 
based  upon  the  simulations,  estimates  of  the  bias  and  MSE 
are  given  by: 


bias  • 


MSE  > 


1  ^ 

•H-  ^  (P,-P> 

"s  i-1  ^ 


1  ns  A  2 

jr  ^  <Pi-p) 

"s  i-1  ^ 


# 


where  n^  is  the  number  of  samples.  It  should  be  noted  that 
nMSE  is  the  quantity  actually  given  in  the  table.  In 
addition,  we  provide  the  ratio 

„  _  MSE (MLE) 

^  "  MSE(MbE) 

as  an  efficiency  measure. 

Upon  viewing  the  results,  it  can  be  seen,  as  expected, 
that  the  bias  and  MSE  associated  with  the  MLE  were  generally 
smaller  than  those  for  the  MDE  when  the  components  were 


normally  distributed.  This  relationship 

between 

the 

estimators  held 

for  both 

overlaps.  The  MLE 

and 

MDE 

were 

quite  similar 

at  pa. 5 

while  for  pa.25 

and 

p».75 

the 

superiority  of 

the  MLE  is 

more  pronounced. 

For  the  t(4)  mixtures  the  relationship  between  NDE  and 
HLE  is  reversed  in  that  the  MDE  generally  has  the  smaller 
bias  and  HSE.  The  superiority  of  the  MDE  in  this  case  is  due 
in  part  to  the  heavy  tails  in  the  t(4)  mixture.  The  MLE 
often  interpreted  an  extreme  observation  as  being  the  only 
sample  value  from  one  of  the  populations  with  all  remaining 
observations  belonging  to  the  other.  Due  to  the  well  known 
singularities  associated  with  a  zero  variance  estimate  for  a 
component  distribution,  Day(1969),  we  were  concerned  that 
the  observed  behavior  of  the  MLE  was  due  to  the  fact  that  we 
did  not  constrain  the  variances  away  from  zero. 
However, simulation  results  in  which  equal  variances  were 
assumed  (which  removes  the  singularity)  and  also  those  which 
used  a  penalized  MLE  suggested  by  Redner(1980)  were  very 
similar  to  those  quoted  here. 

Although  the  MSE  is  a  widely  used  measure  among 
statisticians  for  assessing  the  performance  of  an  estimator, 
the  practical  implications,  for  example,  of  an  estimator 
having  an  MSE  three  times  larger  than  that  for  another 
estimator,  may  not  be  immediately  apparent.  Recall  that  each 
HSE  quoted  in  Table  1  is  based  upon  500  estimates  of  p.  In 
order  to  provide  a  better  appreciation  for  the  practical 
impact  of  differences  in  MSE,  in  Figure  2  we  display 
histograms  of  the  500  estimates  of  p  associated  with  three 


different  MSEs  in  the  table.  The  true  value  of  p  in  each 
case  is  p>.S.  It  is  obvious  that  as  the  MSE  increases,  the 
performance  of  the  estimator  deteriorates.  Notice  that  the 
MSE  for  Figure  2(c)  is  approximately  three  times  greater 
than  the  MSE  associated  with  Figure  2(a),  while  the  MSE  for 
Figure  2(b)  is  aprroximately  twice  that  for  Figure  2(a). 
Thus,  from  these  histograms,  an  intuitive  feel  for 
efficiency  ratios  of  E«2  and  E>3  can  be  obtained. 

A  very  surprising  result  is  that  the  starting  values 
obtained  using  the  procedure  outlined  in  Section  3  produced 
estimators  which  were  competitive  with  both  the  MLE  and  MDE. 
In  fact,  for  both  the  normal  and  t(4)  mixtures,  the  MSEs 
associated  with  the  starting  values  were  lower  than  those 
for  the  MDE  and  MLE  for  every  parameter  configuration 
associated  with  an  overlap  of  .10.  At  an  overlap  of  .03, 
however,  the  starting  values  estimates  were  generally  poorer 
than  those  for  the  MDE  and  MLE. 

5.  Mixtures  of  Asymmetric  Distributions 

The  simulation  results  of  the  previous  section  focus  on 
the  performance  of  the  MLE  and  MDE  under  deviations  from  the 
assumption  of  normality.  However,  the  t(4)  distribution  is 
symmetric,  and  recent  studies  have  indicated  that  there  is 
often  a  substantial  asymmetry  in  the  component  distributions 
for  variables  of  interest  in  agricultural  remote  sensing.  A 
Monte  Carlo  examination  of  the  performance  of  the  MDE  and 
MLE,  assuming  normal  components,  when  in  fact  the  component 
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distributions  were  asynunetric,  was  performed,  and  the 
results  of  this  examination  will  be  discussed  in  this 
section. 

For  purposes  of  our  examination,  we  simulated  mixtures 

2 

of  X  (9)  distributions  with  p«.5.  In  these  simulations  the 

two  distributions  differed  from  each  other  only  by  a 

location  shift.  Actually  the  component  distribution  to  the 

left  is  x^(9)  while  that  to  the  right  is  that  of  a  "shifted" 
2 

X  (9)  with  origin  no  longer  at  0.  This  shift  was  varied  to 

provide  overlaps  of  .01,  .05,  and  .10.  Since  our  estimation 

procedures  involve  a  normality  assumption,  we  used  the  means 

2 

and  variances  of  the  two  component  x  (9)  distributions  and 
the  true  mixing  proportions  as  our  starting  values.  The 
problem  of  obtaining  starting  values  from  the  data  in  this 
case  is  being  examined.  In  Table  2  we  display  the  results  of 
this  simulation.  Only  when  the  two  component  distributions 
were  widely  separated  (overlap*. 01)  do  the  two  procedures 
provide  reasonable  results.  However,  when  the  two  chi-square 
distributions  are  not  widely  separated,  both  estimators  tend 
to  seriously  underestimate  p.  In  Figure  3  we  display  the 
three  mixture  distributions  on  which  these  simulations  were 
based.  We  see  there  that  it  is  no  surprise  that  the  estimate 
of  p  is  less  than  .5,  especially  for  ps.lO.  Both  estimation 
procedures  view  this  as  a  mixture  of  normals,  and  therefore 
make  the  reasonable  interpretation  that  the  density  to  the 
left  has  a  smaller  variance  and  a  mixing  proportion  less 
than  .5.  These  results  point  out  the  impact  which  skewed 
distributions  can  have  on  the  proportion  estimation  in  the 
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mixture  model  when  normal  mixtures  are  assumed. 

Current  investigation  into  this  area  centers  around 
modifying  the  estimation  procedures  by  assuming  that  the 
underlying  component  distributions  belong  to  some  family  of 
distributions  whose  members  can  be  either  symmetric  or 
asymmetric  depending  on  parameter  configurations.  At  the 
present  time^  the  Heibull  distribution  is  being  examined 
concerning  its  usefulness. 

6.  Concluding  Results 

He  believe  that  the  results  of  the  preceding  sections 
are  of  sufficient  substance  to  motivate  further  research  in 
the  area  of  MD  estimation  in  the  mixture  model.  Our  results 
indicate  that  the  MOE  is  indeed  more  robust  than  the  MLE  in 
the  sense  that  it  is  less  sensitive  to  symmetric  departures 
from  the  underlying  assumption  of  normality  of  component 
distributions.  Several  areas  for  future  investigation  have 
already  been  identified  in  addition  to  the  asymmetric 
components  problem  discussed  in  Section  5. 

First,  simulations  similar  to  the  ones  presented  here 
should  be  performed  without  the  assumption  of  only  two 
populations  in  the  mixture.  The  performance  of  the  HOE  and 
HLE  should  be  compared  when  the  number  of  populations  is 
known  and  larger  than  two.  In  addition  the  applicability  of 
the  MOE  to  the  problem  of  estimating  the  number  of 
populations  also  warrants  investigation.  He  plan  to  examine 
these  possibilities. 
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Second,  the  problem  of  applying  the  MDE  to  the  multivariate 
setting  is  of  interest.  Preliminary  indications  are  that 
such  an  extension  will  be  possible. 

Third,  the  choice  of  distance  measure  in  the  MDE  is  a 
topic  of  interest.  Our  results  are  not  meant  to  imply  that 
W  is  optimal. 

Finally,  the  MDE  and  MLE  must  ultimately  be  compared  on 
real  data.  Several  related  practical  considerations  have  not 
yet  been  investigated.  For  example,  when  applying  these 
estimators  to  LANDSAT  data,  the  number  of  iterations  allowed 
must  be  small  due  to  time  constraints.  In  the  simulations 
described  here,  these  constraints  were  not  imposed  and 
iteration  was  allowed  to  continue  until  convergence  was 
obtained.  The  performance  of  the  MDE  and  MLE  under 
convergence  restrictions  should  be  examined. 
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